Annotating Resources for Information Extraction

نویسندگان

Sean Boisen

Michael Crystal

Richard M. Schwartz

Rebecca Stone

Ralph M. Weischedel

چکیده

Trained systems for NE extraction have shown significant promise because of their robustness to errorful input and rapid adaptability. However, these learning algorithms have transferred the cost of development from skilled computational linguistic expertise to data annotation, putting a new premium on effective ways to produce high-quality annotated resources at minimal cost. The paper reflects on BBN’s four years of experience in the annotation of training data for Named Entity (NE) extraction systems discussing useful techniques for maximizing data quality and quantity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classifying articles in English and German Wikipedia

Named Entity (NE) information is critical for Information Extraction (IE) tasks. However, the cost of manually annotating sufficient data for training purposes, especially for multiple languages, is prohibitive, meaning automated methods for developing resources are crucial. We investigate the automatic generation of NE annotated data in German from Wikipedia. By incorporating structural featur...

متن کامل

Annotating Events and Temporal Information in Newswire Texts

Abstract If one is concerned with natural language processing applications such as information extraction (IE), which typically involve extracting information about temporally situated scenarios, the ability to accurately position key events in time is of great importance. To date only minimal work has been done in the IE community concerning the extraction of temporal information from text, an...

متن کامل

Guidelines for Annotating Temporal Information

This paper introduces a set of guidelines for annotating time expressions with a canonicalized representation of the times they refer to. Applications that can benefit from such an annotated corpus include information extraction (e.g., normalizing temporal references for database entry), question answering (answering “when” questions), summarization (temporally ordering information), machine tr...

متن کامل

Implementation and Evaluation of a Negation Tagger in a Pipeline-based System for Information Extraction from Pathology Reports

We have developed a pipeline-based system for automated annotation of Surgical Pathology Reports with UMLS terms that builds on GATE--an open-source architecture for language engineering. The system includes a module for detecting and annotating negated concepts, which implements the NegEx algorithm--an algorithm originally described for use in discharge summaries and radiology reports. We desc...

متن کامل

Annotating and Recognizing Event Modality in Text

Current results in basic Information Extraction tasks such as Named Entity Recognition or Event Extraction suggest that we are close to achieving a stage where the fundamental units for text understanding are put together; namely, predicates and their arguments. However, other layers of information, such as event modality, are essential for understanding, since the inferences derivable from fac...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Annotating Resources for Information Extraction

نویسندگان

چکیده

منابع مشابه

Classifying articles in English and German Wikipedia

Annotating Events and Temporal Information in Newswire Texts

Guidelines for Annotating Temporal Information

Implementation and Evaluation of a Negation Tagger in a Pipeline-based System for Information Extraction from Pathology Reports

Annotating and Recognizing Event Modality in Text

عنوان ژورنال:

اشتراک گذاری